Policy Evaluation Using the Ω-Return
نویسندگان
چکیده
We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.
منابع مشابه
Pricing strategy and return policy of one-echelon green supply chain under both green and hybrid productions
In this paper,we investigate the pricing and return policy issueof one-echelon green supply chain, contain a manufacture who produces two type of products: green and non-green products. These products have a same functional but in selling price and environmentally issues have different effects. Also we consider return policy for both products that can stimulate the customer valuation. We develo...
متن کاملAn Economic Evaluation of Iranian Horticultural Research and Extension Policy: The Case Study of Almond Late Flowering Cultivars
This paper examines the economic effects of investment in developing and introducing Almond Late Flowering Cultivars (ALFC) in a period of 52 years from 1968 to 2020, developed in Sahand Horticultural Research Station (SHRS), using the economic surplus model and field survey data. ALFC make almond supply curve move less to the left when there is a chilling case, thus affect the economic surplus...
متن کاملPrediction the Return Fluctuations with Artificial Neural Networks' Approach
Time changes of return, inefficiency studies performed and presence of effective factors on share return rate are caused development modern and intelligent methods in estimation and evaluation of share return in stock companies. Aim of this research is prediction of return using financial variables with artificial neural network approach. Therefore, the statistical population of this study incl...
متن کاملOptimizing pricing and ordering strategies in a three-level supply chain under return policy
This paper develops an economic production quantity model in a three-echelon supply chain composing of a supplier, a manufacturer and a wholesaler under two scenarios. As the first scenario, we consider a return contract between the outside supplier and the supplier and also between the manufacturer and the wholesaler, but in the second one, the return policy between the manufacturer and the wh...
متن کاملThe Option-Critic Architecture
Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...
متن کامل